This lab focuses on creating maps to display spatial data. It covers both static maps (created using ggplot2) and interactive maps (created using leaflet).

Directions (Please read before starting)

  1. Please work together with your assigned partner. Make sure you both fully understand each concept before you move on.
  2. Please record your answers and any related code for all embedded lab questions. I encourage you to try out the embedded examples, but you shouldn’t turn them in.
  3. Please ask for help, clarification, or even just a check-in if anything seems unclear.

\(~\)

Preamble

Packages and Datasets

This lab will use both the ggplot2 and leaflet packages. It will also require the dplyr package for merging and joining, the maps package, which contains the spatial information needed to create a variety commonly used maps, and maptools, a package used to process and read certain spatial data files.

# load the following packages
# install.packages("leaflet")
# install.packages("maps")
# install.packages("maptools")
library(leaflet)
library(ggplot2)
library(dplyr)
library(maps)
library(maptools)

\(~\)

Preamble

There are three main geometric elements used to represent spatial data:

  1. Points - simple coordinates showing the position of on an object (ie: cities, crash sites, etc.)
  2. Lines - ordered sets of coordinates that can be connected (ie: roadways, power lines, etc.)
  3. Polygons - ordered sets of coordinates that form an enclosed shape when connected (ie: states, congressional districts, etc.)

\(~\)

Choropleth Maps

A choropleth map uses colored polygons to represent values corresponding to geographic units.

The first step in creating a choropleth map is creating the polygons that define each geographic unit:

## Get state polygons from the "maps" package
state_polys <- map_data("state")  

## Graph these polygons
ggplot(data = state_polys, aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "lightblue")

In state_polys each “region” has a series of ordered vertices, each with its own longitude and latitude. The “group” and “order” of these vertices is used to draw the edges of that region’s polygon.

To make a choropleth map, we must color each polygon according to a variable. This requires joining new data with the data frame that contains the polygons:

## USArrests data from the "datasets" package
data("USArrests")

## Create a data frame to join
state_values <- data.frame(State = tolower(rownames(USArrests)), 
                           MurderRate = USArrests$Murder)

## Add MurderRate to the state polygons
full_data <- inner_join(x = state_polys, y = state_values, by = c("region" = "State"))

## Color the polygons using a fill variable
ggplot(data = full_data, aes(x = long, y = lat, group = group, fill = MurderRate)) +
                        geom_polygon(color = "black")

\(~\)

Adding points or lines

Because both polygons and lines/points use the “x” and “y” aesthetics, it is necessary to use local specification if you’d like to create a map involving multiple types of spatial features:

data("us.cities") ## Data on 1005 US cities

ggplot() +
   geom_polygon(data = full_data, aes(x = long, y = lat, group = group, fill = MurderRate), color = "black") +
   geom_point(data = us.cities, aes(x = long, y = lat), size = 0.1, color = "red")

The example above demonstrates this by adding various US cities to the murder rate choropleth map we previously created (notice how different data arguments are used within geom_polygon() and geom_point()).

\(~\)

Lab

Practicing with polygons and points

For this practice question you should use the “state” and “county” polygons found in the maps package, as well as the us.cities data frame:

state_polys <- map_data("state")
county_polys <- map_data("county")
data("us.cities")

Question #1: Filter the us.cities data to include only state capitals (capital == 2) in the contiguous United States (exclude cities in AK and HI). Then create a map displaying state polygons (defined by black borders with fill = NA, size = 0.3 and alpha = 0.3) and counties (defined by blue borders with fill = "white", size = 0.1 and alpha = 0.5) that also includes state capitals as points (with color = "yellow"). To exploit layering, you should add the county polygons first and the cities last. Your end result should be similar to the map shown below.

\(~\)

Shapefiles

The polygon boundaries given by map_data() are easy to understand, but they lack the flexibility provided by more complex spatial storage structures. In most applications, geographic information is stored in a shapefile, a complex format that must contain the following components:

  • .shp - file containing geometry of all features
  • .shx - file indexing the geometry
  • .dbf - file storing certain feature attributes in a tabular format (things like population, area, etc.)

To practice working with a shapefile, you should download and extract this folder, which was originally obtained from Natural Earth Data). You should notice it contains .shp, .shx, and .dbf files (along with a few others).

To begin, we’ll load the shapefile using the readShapeSpatial() function within the maptools package:

world <- readShapeSpatial("C:/Users/millerry/Downloads/world_shape/ne_50m_admin_0_countries")  ## You will need to change this file path

Notice we used a local path to the shapefiles, and we omitted the .shp/.shx/.dbf file extensions.

Using str() to learn about the structure of world, we see it contains 5 “slots”, which are components we can reference using @ (similar to $ for data.frame objects):

str(world, max.level = 2)  ## Only print the first 2 levels
## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  242 obs. of  168 variables:
##   .. ..- attr(*, "data_types")= chr [1:168] "C" "N" "N" "C" ...
##   ..@ polygons   :List of 242
##   ..@ plotOrder  : int [1:242] 240 76 203 17 196 211 226 182 134 232 ...
##   ..@ bbox       : num [1:2, 1:2] -180 -90 180 83.6
##   .. ..- attr(*, "dimnames")=List of 2
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   ..$ comment: chr "FALSE"

Unfortunately, ggplot cannot work directly with shapefiles, so we must use the fortify() function to convert them to a workable format:

world_polys <- fortify(world)  ## Converts the shapefile into a data frame of polygons
ggplot() +  geom_polygon(data = world_polys, aes(x = long, y = lat, group = group), fill = "white", color = "blue")

Note: fortify() has been replaced by the tidy() function in the broom package - but you can still use it (at least for now).

Neither fortify() nor tidy() will attach anything from your shapefile’s “data” slot to the data frame it creates. To do this, you must create your own “id” variable that matches the ordering of the “id” that was automatically generated:

# Create a new column called "id"
world_data <- mutate(world@data, id=as.character(0:241))

# Join the tidied shapefile with the data from our original shapefile.
world_polys_data <- left_join(world_polys, world_data, by = "id")

# Test it out by seeing if India, Brazil and Canada appear w/ different fill colors
ggplot() +  geom_polygon(data = world_polys_data, 
                         aes(x = long, y = lat, group = group, 
                             fill = ifelse(NAME_EN %in% c("India", "Brazil", "Canada"), 
                                           "yes", "no")), 
                           color = "black") + guides(fill = FALSE)

Question #2: The variable “GDP_MD” (already present in the merged data frame created above) records the gross domestic product (GDP) in millions of US dollars for the year 2019. For this question, create a choropleth that colors each country from white to green according to its GDP. Hint: you should use the scale_fill_continuous() function.

\(~\)

Leaflet

Another popular tool used to create maps is leaflet, an open-source JavaScript library used by many professional organizations.

The base layer of a leaflet map is a set of map tiles. Additional layers, such as polygons, lines, or point markers can be added via piping:

## Just the base layer
leaflet() %>%
          addTiles()
## Base layer plus polygons from the world shapefile
leaflet(world) %>%
          addTiles() %>% addPolygons()

Note that leaflet() uses shapefiles directly, which is different from ggplot.

\(~\)

Leaflet map tiles

By default, addTiles() will generate a base map using OpenStreetMap. This works great for the vast majority of mapping applications, but if you’d like something different Leaflet offers a list of alternative tile sets in the providers object. These can be used via the addProviderTiles() function:

## List of provider tile sets
head(providers)
## $OpenStreetMap
## [1] "OpenStreetMap"
## 
## $OpenStreetMap.Mapnik
## [1] "OpenStreetMap.Mapnik"
## 
## $OpenStreetMap.DE
## [1] "OpenStreetMap.DE"
## 
## $OpenStreetMap.CH
## [1] "OpenStreetMap.CH"
## 
## $OpenStreetMap.France
## [1] "OpenStreetMap.France"
## 
## $OpenStreetMap.HOT
## [1] "OpenStreetMap.HOT"
## Example ()
leaflet() %>%
          addProviderTiles(providers$HERE.satelliteDay)

\(~\)

Leaflet polygons

In leaflet, polygon attributes are controlled by the arguments of addPolygons(). Of particular importance is the fillColor argument, which requires a vector of colors with a length equal to the number of polygons.

The code below demonstrates how to use the fillColor to color each country by the base-ten log of “POP_EST” using the “magma” color palate:

## Vector of values to color by
col_vals <- log10(world@data$POP_EST)
col_vals[is.infinite(col_vals)] <- NA   ## Replace infinities with NA

## Create a color palette (using the "magma" color scheme)
pal <- colorNumeric("magma", domain = col_vals)

leaflet(world) %>% addTiles() %>% 
      addPolygons(fillColor = pal(col_vals),  weight = 1) %>%
      addLegend(pal = pal, values = col_vals,
            title = "Estimated population (log10)", position = "bottomleft", na.label = "Missing")

Leaflet will not generate a color legend by default, so in the example above we created our own using the addLegend() function.

Question #3: Use Leaflet to create a choropleth map displaying the variable “GDP_MD” using the “RdYlBu” color palette. Your map should include an appropriately formatted legend and use the “Stamen.Toner” provider tile set.

Labels and popups

There are two ways to add interactive information to a Leaflet map:

  • Labels appear when the tool-tip hovers over the polygon
  • Popups appear when the polygon is clicked on

Similar to plotly, Leaflet allows the use of HTML commands:

## Set up the labels and popup
myLabels <- paste("<strong>", world@data$NAME_EN, "</strong>", "<br/>", 
                   "Population:", world@data$POP_EST)
myPopups <- paste("Income Group", world@data$INCOME_GRP)


## Add to map (using "highlight" argument)
leaflet(world) %>% addTiles() %>% 
      addPolygons(fillColor = pal(col_vals),
                  weight = 1,
                  highlight = highlightOptions(
                     weight = 3,
                     color = "grey",
                     fillOpacity = 0.7,
                     bringToFront = TRUE),
                     label = lapply(myLabels, htmltools::HTML),
                     popup = myPopups) %>%
  addLegend(pal = pal, values = col_vals,
            title = "Estimated population (log10)", position = "bottomleft", na.label = "Missing")

The highlight argument dictates how a selected polygon should be highlighted (in this case it will be brought to the front with a grey border and an opaque fill)

Additionally, for labels/popup text with HTML commands to render properly, the HTML() function in the htmltools package needs to be applied to each element of the vector containing the labels (achieved by the lapply() function).

\(~\)

Leaflet polygons using external data

Often we’d like to combine multiple data sources when creating a map. For example, we might want to use the “Happy Planet” data to create an interactive choropleth map. To accomplish this, we must pay special attention to how the external data is merged with the @data component of our shapefile.

hp <- read.csv("https://remiller1450.github.io/data/HappyPlanet.csv")  ## contains 143 countries

### find any non-matches using anti_join
non_matches <- anti_join(x = hp, y = world@data, by = c("Country" = "NAME_EN"))
non_matches$Country 
## [1] "Burma"                   "China"                  
## [3] "Congo"                   "Congo, Dem. Rep. of the"
## [5] "Korea"                   "Macedonia"

Notice there are 6 countries in the Happy Planet data whose names do not match any of the polygons in world; however, this is not because those countries do not exist within world (China as an example). We will rectify the issue by modifying the country names inside the Happy Planet data set:

## Repair names (manually)
hp$Country[hp$Country == "China"] <- "People's Republic of China"
hp$Country[hp$Country == "Congo, Dem. Rep. of the"] <- "Democratic Republic of the Congo"
hp$Country[hp$Country == "Congo"] <- "Republic of the Congo"
hp$Country[hp$Country == "Korea"] <- "South Korea"
hp$Country[hp$Country == "Burma"] <- "Myanmar"
hp$Country[hp$Country == "Macedonia"] <- "North Macedonia"

### check for non-matches (none)
anti_join(x = hp, y = world@data, by = c("Country" = "NAME_EN"))
##  [1] Country        Region         Happiness      LifeExpectancy Footprint     
##  [6] HLY            HPI            HPIRank        GDPperCapita   HDI           
## [11] Population    
## <0 rows> (or 0-length row.names)

We’re now ready to join the Happy Planet data with our shapefile and create a map:

world <- readShapeSpatial("C:/Users/millerry/Downloads/world_shape/ne_50m_admin_0_countries") ## Reload before overwriting
world@data <- left_join(x = world@data, y = hp, by = c("NAME_EN" = "Country"))

## Vector of values to color by
col_vals <- world@data$Happiness
col_vals[is.infinite(col_vals)] <- NA   ## Replace infinities with NA
pal <- colorNumeric("viridis", domain = col_vals)

## Add to map (using "highlight" argument)
leaflet(world) %>% addTiles() %>% 
      addPolygons(fillColor = pal(col_vals),
                  weight = 1) %>%
  addLegend(pal = pal, values = col_vals,
            title = "Happiness", position = "bottomleft")

Question 4: The world.cities data frame contained within the maps package includes the locations and populations of more than 43,000 world cities. The code below creates a filtered version of these data that contains only capitals with a population greater than 500,000. For this question, join the information in this filtered data set with the @data component of the world shapefile. To do so, you will need to manually repair the names of 7 countries. Next, use the information from world.cities to color all countries with a capital city population greater than 500,000 in red.

data("world.cities")
caps <- subset(world.cities, capital == 1 & pop > 500000)

Shown below is an example of this map (yours can use different colors):

\(~\)

Leaflet markers

The addMarkers() function is the most common way to display geographic points on a leaflet map. At minimum, a marker only needs a latitude and longitude. However, markers are most effective when combined with labels and/or popups:

data("us.cities")
leaflet(data = filter(us.cities, pop > 500000)) %>%
          addTiles() %>%
          addMarkers(lng = ~long, lat = ~lat,
                     label = ~paste(name),                ## Hover on a marker to see the city name
                     popup = ~paste("Population:", pop))  ## Click on a marker to see the population

In this example, we filtered us.cities to only include cities with large populations so that we could avoid plotting too many markers in a small geographic area.

However, we could accommodate all 1005 cities by using cluster markers:

leaflet(data = us.cities) %>%
          addTiles() %>%
          addMarkers(lng = ~long, lat = ~lat,
                     label = ~paste(name),
                     popup = ~paste("Population:", pop),
                   clusterOptions = markerClusterOptions())

Or by using circle markers with a low opacity to make over-plotting less of an issue:

leaflet(data = us.cities) %>%
          addTiles() %>%
          addCircleMarkers(lng = ~long, lat = ~lat,
                     label = ~paste(name),
                     popup = ~paste("Population:", pop),
                     fillOpacity = 0.20,
                     color = "yellow",
                     stroke = FALSE)

In the example above, the argument fillOpacity = 0.2 makes each marker 80% transparent, while stroke = FALSE removes the solid outline around each marker.

Question #5: Using the world.cities data, create a map where each city is depicted by a circle marker that displays the city’s name and country when you hover over it, and the city’s population when you click on it. Additionally, your map should color capital cities red and non-capitals yellow and you should use clusterOptions = markerClusterOptions() to display clusters when zoomed out.

Shown below is an example of this map (yours should appear similar to it):